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Abstract 

It is well known that CMB temperature anisotropies and polarization can be used to probe the 
metric perturbations in the early universe. Presently, there exist neither any observational detection 
of tensor modes of primordial metric perturbations nor of primordial non-Gaussianity. In such a 
scenario, primoridal power spectrum of scalar metric perturbations is the only correlation function 
of metric perturbations (presumably generated during inflation) whose effects can be directly probed 
through various observations. To explore the possibility of any deviations from the simplest picture 
of the era of cosmic inflation in the early universe, it thus becomes extremely important to uncover 
the amplitude and shape of this (only available) correlation sufficiently well. In the present work, 
we attempt to reconstruct the primordial power spectrum of scalar metric perturbations using the 
binned (uncorrelated) CMB temperature anisotropies data using the Maximum Entropy Method 
(MEM) to solve the corresponding inverse problem. Our analysis shows that, given the current 
CMB data, there are no convincing reasons to believe that the primodial power spectrum of scalar 
metric perturbations has any significant features. 



gaurav@iucaa.ernet.in 
^ j ayant i@iucaa.ernet . in 



Contents 



1 Introduction [l| 



Deconvolution Problem 

2.1 Formulation as an inverse problem 

2.2 Bayesian inversion 



8 



3 The Cambridge Maximum Entropy Algorithm 

3.1 Entropy and 

3.2 Construction of the subspace 

3.3 Optimization within the subspace 

3.4 Stopping criterion [12 

4 Recovering Primordial Power Spectrum 

4.1 The radiative transport kernel 

4.2 Recovering test spectra 

4.3 WMAP 7 year binned CMB data 



5 Summary and discussion [18 



A Testing the method [19 



1 Introduction 

Observations of Cosmic Microwave Background (CMB) temperature anisotropies as well as polarization 
[Ij can be used to uncover the physics of the early universe e.g. of cosmic inflation [21 El S]. However, 
calculations of power spectra of CMB anisotropies and polarization [5l [6] involve making a number of 
assumptions e.g. about the reionization history of the universe, the equation of state of dark energy 
etc. It is also usually assumed that the primordial power spectrum of scalar metric perturbations 
(denoted by sPPS in this work) is a power law (with a small running). One can then use the CMB 
observational data to put constraints on the values of various cosmological parameters [8] including the 
ones specifying sPPS (usually denoted by As, ns etc). Since this procedure leads to "resonable" values 
of these parameters, it is often said that a power law sPPS is consistent with the observed data. But 
it is worth noticing that this is just an assumption. 

Cosmic inflation is the most actively investigated paradigm for explaining the origin of anisotropies in 
CMB sky as well as the large scale structure of the universe. The simplest versions [2l|3ll3j of inflationary 
models give a smooth, nearly scale- invariant (tilted red) sPPS. But there are other models which are 
capable of giving more complicated forms of sPPS (abnormal initial conditions, multifleld models, 
interruptions to slow roll evolution, phase transition during inflation, see e.g. [HI |T2l [131 Ull 113 [16]). 
Are these models ruled out by the present data? Thus, even though power law sPPS is consistent 
with the data, the assumption of a power law PPS (with small running) is just that: a well motivated 
assumption. It is worth checking, how the models in which sPPS is not just a simple power law with a 
small running fare against the present available data. 

This can be done in various ways: e.g. one could try to redo cosmological parameter estimation 
with the actual form of sPPS left free (see e.g. [9]). Another option is to work with inflationary models 
which lead to features in sPPS and redoing parameter estimation for those models (see e.g. [lUt lllj). 
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This exercise illustrates that (i) models in which sPPS is not this simple also do fit the data, (ii) very 
often, with these models, one can get a better fit to data than power law with small running. 

Given this situation, a reasonable possibility is to try to directly deconvolve sPPS from observed 
CMB anisotropics ( i.e. Qs). Previous attempts [l7j at doing so seem to suggest the existence of 
features in sPPS (the statistical significance of which is still being assessed [IS]), e.g., a sharp infrared 
cutoff on the horizon scale, a bump (i.e. a localized excess just above the cut off) and a ringing (i.e. 
a damped oscillatory feature after the infrared break). This is consistent with many existing models 
of inflation and this has also motivated theorists to build models of inflation that can give large and 
peculiar features in primordial power spectrum (see pT ] \T2 \ [TS j [M j [T5l [TBj ) . 

Given the fact that primoridal power spectrum of scalar metric perturbations is the only cosmological 
correlation whose effect is, at this stage, observable in the universe (Primordial non Gaussianity is yet 
to be detected in CMB data, so are B modes of polarization of CMB due to inflationary Gravitational 
waves), it becomes important to settle this issue of possible existence of features. 

In the present work, we try a new method of probing the shape of primordial power spectrum: 
the Maximum Entropy Method (MEM). We begin in ^ by broadly describing the problem and its 
various attempted solutions. Then, in ^ we describe in detail the algorithm that we have used. This 
is followed by ^in which we apply the algorithm to binned CMB temperature anisotropies data. We 
conclude in ^with a discussion of salient features, limitations and future prospects for the work. In 
the appendix we present the results of applying the method on a toy problem and in the process 
illustrate the use of the algorithm. 



2 Deconvolution Problem 

2.1 Formulation as an inverse problem 

We address the issue of reconstructing the shape of the sPPS by attempting to directly solve the 
(noisy) integral equations giving the CMB angular power spectrum using MEM. The observed CMB 
TT angular power spectrum is given by (see e.g. p^): 



POO 

Q obs = / dk 
Jo 



Pm + Crnoise (1) 



here, £ is the multipole moment, k is the wave number and the quantity in the square brackets is the 
radiation transfer function (ryo denotes the value of conformal time today) and P^{k) = ^^(|$(/c)p) 
is the power spectrum of the scalar metric perturbation in Newtonian gauge (often called Bardeen 
potential, Assuming a given set of values of background cosmological parameters, the radiative 
transport kernel can be found (see we can then formulate the problem we are dealing with as the 
solution of a set of integral equations i.e. as an inverse problem. 

The scalar primoridal power spectrum is the power spectrum of comoving curvature perturbation: 

P[k)^Pn{k) = ^mk)f) (2) 

here TZ{k) is the mode function H of the comoving curvature perturbation on super-Hubble scale (when 
it has become frozen). For a power law sPPS, 



k 



ns-l 



P{k) = ^5 • ( ^ ) (3) 



''so it has mass dimension —3/2 rendering P(k) dimensionless. 
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In matter dominated universe (at the time of recombination), at linear order in perturbation theory, 
$ = (3/5)7^ , so, for a power law PPS, Cj^ should be (in /iK^) 



CTT rp 2 

e theory " -^CMB 



q \ ^ poo 



47r 
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(4) 



We shah now replace Tcmb^ ' (t)^ (|^) 
f{k). We thus have 



by a general function f{k) and try to find this function 
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with 



f{k) = Tcmb' 



(At(^,A;,?7o))^ 



P{k) 



fik) 



(5) 
(6) 



and so the function f(k) shall have values of the order of magnitude of 10'^. 

Given the temperature radiation transfer function (At(£, /c, r/o)), the theoretical Cf^ can be found 
from Eq [1] provided, we know the sPPS. The i range for which we wish to evaluate the transfer function 
and the corresponding C^s goes from i = 2 to £ = /max = 1500. The typical behaviour of the function 



Gie,k) = dk—iAT{l,k,rjo)f 



(7) 
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Figure 1: Typical behaviour of the kernel Glk for low £ values. 

is shown in the Fig ([1]) (with dk chosen such that the integral in the definition of Ci can be evaluated 
to a high enough accuracy). For every given I, the radiation transport kernel is a highly oscillatory 
function of the wavenumber k. But for any £, it has significant (i.e. non-negligible) values only within 
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a small range of k values. The brightness fluctuations roughly go as je[k{i]o — V*)] (where ji is spherical 
bessel function while 77* is the conformal time at the epoch of recombination), thus the minimum value 
of £ sets a minimum value of k at which the kernel takes up non-negligible values. This procedure tells 
us that since the radiation transfer function is neglible for k < femirn no matter how much power is there 
in sPPS at very small /c— values, the CMB anisotropics cannot be used to probe the sPPS at these (very 
large scales). This sets the /cmin below which we cannot probe the sPPS. Similarly, given the fact that 
we have observations only till a maximum value of i, this sets the maximum value of k upto which we 
need to sample the kernel: thus, the smallest possible angular resolution of a CMB experiment shall 
set the Imax that we can probe which shall set a /cmaxj i-e. sPPS at scales smaller than this scale can 
not be probed by CMB experiments. Thus, £ = 2 determines kmm while £ = /max determines femax- 
Within this range, one discretizes the A:— space in such a way that the transfer function can be sampled 
sufficiently well and the above integral can be performed to the desired accuracy. 

Apart from this consideration, the actual observed Cis are also noisy (due to cosmic variance, 
instrumental noise and the effect of masking the sky). Thus Eq ([1]) can be written as a set of linear 
equations 

ris 

Ci = ^ Gikfk + Ci (8) 

k=l 

where Us is the number of bins in fc— space and Cf is the noise term. Thus the problem we wish to 
solve is: given the matrix G, the few observations (C^s), the moments of the random variables Cf , 
how can we find the set of numbers /fc? In this paper, we shall use the binned CMB data to find sPPS. 
The number of (binned and hence uncorrelated) data points (WMAP) is 45 (call it n^). To sample the 
kernel satisfactorily, we divide the k space into 6200 points (n^). Thus, we have a problem with a set 
of 45 noisy linear equations and 6200 unknowns to be determined. 



2.2 Bayesian inversion 

Recovering the primordial power spectrum fk from the observed Ci can be casted as a Bayesian inversion 
problem in the following way. The posterior probability P{fk\Ci, Gik) of obtaining the primordial power 
spectrum given a kernal Gik and observed Ci is given by: 

P{h\CuG,,) = ^™^^ (9) 

where P{Ci\fk,Gik) is the likelihood and P{fk) is the prior probability. For our case the denominator 
(evidence) works just a normalization and we can ignore it. 

For the case of Gaussian noised the likelihood function can be written as 

P{Gi\h,Gik)^eM-xV'A (10) 

where 

for the case when the noise covariance matrix is diagonal. 

As we shall see in section |4l this number is 6200, thus our Gik matrix shall have dimensions 1499 x 6199 ( = 
9292301 entries). 

^ Even though the noise on Ces is not Gaussian, we proceed pretending the noise to be Gaussian. This is justified 
because by the central limit theorem: since the chi-squared distribution is the sum of Ud independent random variables 
with finite mean and variance, it converges to a normal distribution for large Ud- 
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Since for our problem the number of unknowns i.e., fk are far more than the number of knowns i.e., 
Cis therefore ordinary chi square minimization is of no use since it can make the chi square too fowlfl . 
In order to avoid chi sqaure taking unphysical values we need some form of regularization in the form 
of prior. In place of maximizing the likelihood function we maximize the posterior probability. 

It has been a common practice to conside the following form of prior for any regularization problem 

P{fk) P{fk, A, S) = exp[-AS(/fc)/2] (12) 

where A is the regularization parameter and S is the regularization function. There have been many 
form of regularization function like quadratic form etc. 

In the present work we use an Entropy function S{fk) as a regularization function which is defined 
in the following way 

S{fk) = -Y.fk 

k 

where A is a parameter which parameterizes the entropy functional we use. 

With the regularization function the posterior probability distribution can be written as 

P{fk\Gik, Ci) = exp[-xV2] * exp[-A5/2] = exp[-(x2 + \S)] = exp[-M(A)] (14) 

where 

M{h) = \{x^ + \S{h,A)) (15) 

Maximum entropy method is a particular (nonlinear) inversion method. Here the regularization 
function S{fk,A) is non-quadratic so that the equations to be dealt with to solve the optimization 
problem shall turn out to be non-linear. Without such a maximum entropy (ME) constraint, the 
inversion problem is ill-posed (since the data can be satisfied by an infinity of primordial power spectra). 
The condition that the entropy be a maximum selects one among these. There exist, in the literature, 
various arguments justifying the use of MEM over other ways of inversion (often using arguments from 
information theory ), at this stage, we just treat it as just another nonlinear version of the general 
regularization scheme. 



In 



(13) 



3 The Cambridge Maximum Entropy Algorithm 

So, the problem that we wish to solve involves a highly under-determined system of linear equations. 
As was mentioned in the last section, one way in which we can attempt to solve this problem is to 
formulate it as a problem involving the optimization of a non-quadratic function (which will require 
solving a set of non-linear equations) subject to a constraint. Since the number of unknowns is so 
large, we have to solve the corresponding constrained non-linear optimization problem in a very large 
dimensional space. Also, we have other constraints that we need to take care of e.g. the components 
of / are positive quantities (since / is a power spectrum), so the optimization algorithm that we use 
must not cause the components of / to become negative (this requirement rules out methods such as 

^even if we knew all the Us parameters, the presence of noise shall ensure that shall be a sum of Ud normalized 
Gaussians. 

^It is often argued that while the maximum likelihood method approach selects the spectrum that has the largest 
probability of reproducing the data, the maximum entropy method, instead, selects the positive spectrum to which is 
associated the largest number of ways of reproducing the data, i.e., the one that maximizes the information-theory 
definition of the entropy of the spectrum subject to the given constraints. 
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the steepest ascent). Similarly, since the the objective function is quite different from a pure quadratic 
form, methods such as conjugate gradient method are not very useful. 

Experience has shown that one of the strategies which work (despite being complicated) is the 
following: instead of searching for a minimum in a single search direction (e.g. in steepest ascent 
method), one searches in a small- (typically three-) dimensional subspace. This subspace is spanned 
by vectors that are calculated at each point in such a way as to avoid directions leading to negative 
values. The algorithm that we use is based on the one developed by Skilling and Bryan pTl [22] and is 
sometimes referred to as The Cambridge Maximum Entropy Algorithm. It has been extensively used 
in not only radio astronomy but also in other fields. Here we quickly review this algorithm for the sake 
of completeness. 

3.1 Entropy and ^ 

The problem to be solved involves finding a set of (A: = 1, 2, • • • , n^) (with maximum entropy) from 
a dataset Dg (£ = 1, 2, • • • , n^). For any /fc, let 

Fi = Y,Gikh (16) 

k 

We shall use the following definition of entropy (the non-linear regularization function) 

S = -Y. fklHfk/A) -1] = -Y,fk Hfk/eA) (17) 

k k 

here, j4 is a fixed number (sometimes called "the default") that sets the normalization of /. Notice 
that S{0) = 0, S{fk = A) = Hg ■ A, S{fk = eA) = 0. This gives, (since A is fixed), 

dS/dfj = logiA/f,), d^S/dfidf, = -6,j/fj . (18) 

telling us that diS{0) = oo, diS{fk = ^) = and diS{fk = eA) = —1. It is easy to see that entropy 
surfaces are strictly convex. Also, the expression for the various derivatives of the entropy tell us that 
the solution fi = Ais the global maximum of entropy, this fact shall be important later. The measure 
of misfit that we shall use (in order to use the data) is the Chi-squared function 

Cif) = x' = ^{F,-D,f/ai^ (19) 

e 

from which we get, the gradient of C 

dC/df, = Gij 2 {Fe - De)/ae^ (20) 

and the Hessian 

d''C/dm,=J2Gej (^)G,i. (21) 

For a linear experiment, the surfaces of constant chi-squared are convex ellipsoids in N-dimensional 
space. The largest acceptable value for at 99 percent confidence is about Caim = nj. + 3.29y^ 
(with rill being the number of observations), see [21]. As the above equations show, quantities such as 
gradient of C and Hessian of C can be easily evaluated (though finding the Hessian of C is the one of 
the most computationally expensive tasks since the matrix Gik is 45 x 6200 and Hessian of C shall be 
6200 X 6200 matrix). 
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At every iteration, instead of searching for the maximum of S and minimum of C along a hne, we 
search in an n dimensional subspace of the parameter space. So, instead of 

/Vew)=r + ^e* (i = l,2,... (22) 

we shall have (with being n search directions) 

n 

r(new)=r + E ^''^M (23) 

Sufficiently near any point, every function can be approximated by a quadratic function (provided the 
higher order terms in the Taylor expansion can be ignored). So, within the subspace we shall model 
the entropy and chisquared by 

5(/ + ^xe)-5(x) (24) 
C(/ + J]xe)^c(x) (25) 

where s{x) and c(x) are quadratic 

s{x) = s{0) + s^x'' - g^uxi'x^/2 (26) 

c{x) = c(0) + Y Cm^'' + Yl V^''a;72 (27) 



which correspond to the first three terms in the Taylor series expansion of S{f) and C{f). The first 
order term in the Taylor expansion of S is 



u i=l ^ •' ^ 



which tells us what should be. Similarly, c^, g^j^i, and h^i, can be found: 

dC 

9ixv = - Y ^M^^ Q Jig p (29) 

V = (30) 

Thus, if we know the basis vectors, we can find the quadratic functions s{x) and c{x). 
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Obviously, the above definitions shall not be valid to arbitrary distances from the point in question. 
The quadratic models are reliable only in the vicinity of the current / where cubic and higher powers 
can be nelgected. Thus, the step size at each iteration must be such that 

l^/P < Zo' (31) 

for some Iq. We thus need to define the concept of distance in this abstract space. Recall that this 
means we need to define a metric 

ds^ ^J^dvdf'^^f (32) 

note that the metric gij is different from the function g^i, defined by Eq. (|26|) . Experience (see |21j ) 
has shown that the following definition of distance works well 

9^^ = ji (33) 

this needs to be compared with the expression for the Hessian matrix of entropy (notice that g^^ = /M*-'). 
It is straightforward to show that 

ds^ = E 9^,dfdf = ^ g^.x'^x-' (34) 
ij ^lv 

while choosing Iq^ to be 1/5 of ^ / works well (see |21j). The algorithm works in the following way: at 
every iteration, when we are at a point in the / space, one considers a distance region s.t. the quadratic 
model is a good approximation in that region. We now find a subspace and within this subspace, we 
try to find the place where 

1. s{x) is maximum, 

2. c{x) equals some Caim, and, 

3. the distance of this new point from the old point is smaller than ^o- 



3.2 Construction of the subspace 

So, how do we decide the basis vectors which span the subspace? One of our aims is to find the maximum 
of entropy on the surface of ellipsoid corresponding to = Caim- So, naturally, the direction of gradient 
of entropy must be one of the basis vectors. Since the metric in the space of interest is not Cartesian, 
there shall be a distinction between contravariant and covariant components of vectors in the space. 
Since the "position vector" of any point is a contravariant vector, gradiant such as dS/dp is going 
to be a covariant vector. So the first (contravariant) basis vector is 

^\ = p"§ = r§- (35) 

The meaning of this direction is easy to understand by recalling its equivalent in usual Cartesian space. 
In the usual situation, (VT) • ndr = dT (i.e. if we are at any point, and we go in the direction n by a 
distance of dr, the change in the value of the function is dT). It is obvious from this expression that 
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when h is parallel to the direction of gradient, the change df is maximum. Thus, to maximize the 
change in /, we shall move in the direction parallel to VT so that 

j 

This equation should be compared with the definition of the first basis vector, Eq j35| (and since the 
kronecker delta is the metric in a Cartesian space, the two equations are equivalent). Thus, the first 
basis vector tells us the direction in which the entropy change per unit distance is maximum. 
Similarly, another basis vector could be 

^i = p"w = rW' 

since we wish to change the every iteration so that we eventually reach the = Caim surface. If 

we find what the two search directions (defined above) become after incrementing by x^ei + x'^e2, the 
direction ei shall stay within the subspace spanned by ei and 62 but the direction 62 shall go out of 
the subspace (see [22]). This suggests that we choose more basis vectors such as 



4 = rE^i d^C/dfdf, (38) 

j 

ei = f^ei d^C/dfdf (39) 
j 

Experience has shown that a family of three or four such search directions gives quite a robust algorithm 
for solving the problem. In our problem, we chose the third search direction to be 

where, the following Eqs define the lengths Lg and Lc which are the gradient vectors 



1 1 



L =(- — —\ ' L =(-'^ — —\ ' (41) 

our experience has shown that putting the factors of Lg and Lc in the definition of the third basis vector 
improves the speed of convergence of the answer. 



3.3 Optimization within the subspace 

Once we have found the subspace (by finding the basis vectors in the space of all /s), we proceed 
as follows: we now wish to find the step, the coefficients x in Eq ()23p . To do this, we shall solve a 
corresponding constrained optimization problem in the n dimensional subspace (as was stated in the 
previous subsection, we worked with n = 3, but we shall continue to explain the details for a general 
n). Since the functions s{x) and c(x) are quadratic, the problem in the subspace is much simpler: it is 
a simple problem of quadratic programming (quadratic objective function with quadratic constraint). 
The only additional complication is that the quadratic model is not valid to arbitrary distances from 
the original point, so we need to satisfy an additional distance constraint. 
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Let us begin by recalling that both the matrices g and h are real-symmetric. Also, the matrix g is 
positive definite. The reason is as follows: the way we have defined the metric on the space (see Eq 

m ), 

ds^ = ^ -g,,drdf = ^ ^ = g^^x^x- > (42) 

ij i iJLV 

where in the last step we used Eq p4p . So, it is clear that g^y is a positive definite matrix (which 
imples that all its eigenvalues are positive). Thus if one of the eigenvalues of g^^ is a small positive 
number, numerical errors can cause it to become negative. In our implementation of the algorithm, 
we choose to ignore any directions which are defined by eigenvalues which are too small. Additional 
simplification occurs if we simultaneously diagonalize the two matrices g and h. |f| 

After simultaneous diagonalization, within the subspace, the quadratic model functions S and C 
are given by 

S{x) = so + Ys^x^-]^'Yx^,^, (43) 

C{x) = Co + ^C/,x^ + ^ J^Tm^^m'- (44) 

the quantities etc are now defined in terms of the new basis vectors (but the same old definitions). 
Since this causes the function g^^i^y to become a Kronecker delta, the distance constraint Eq. looks like 

^' = IZV^^o' (::^0.lj;/to0.5j;f) (45) 

We chose the coefficient on the RHS to be 0.2 and we verified that the actual value of this number 
is unimportant. Typically, the function C is such that all its eigenvalues are (also) positive, then, the 
minimum value of the function C in the subspace (where the above definitions work) is 

1 

Thus, no matter what the global aim Caim is, at a given iteration, within the subspace, we can not get 
to any values below Cmin- In fact, even trying to achieve Cmin is not a great idea since in that case we 
shall not use any information about S. 

The real challenge in the subspace is to satisfy the distance constraint. Many different elaborate 
tricks have been mentioned in the literature to do this. We choose to not worry about getting a quick 
answer, hence we do the following: in order to ensure that the distance constraint always gets satisfied 
(i.e. we do not go too far from the present location in just one step), we shall choose to have a Caim 
which is not too different from Cq (the present value of x^)- We thus choose 

Caim = max(aCmin + (1 - a)Co, Caim) (47) 



* Simultaneous diagonalization Theorem: If A and B are real symmetric matrices and B is positive definite, then 
there exists an invertible matrix P s.t. BP — I and P^AP is diagonal. The diagonal entries of A are the roots of the 
polynomial det(2;_B — A) = 0. Notice that this is different from finding a basis in which both are diagonal. Here, if B is 
chosen to be identity matrix, then P is orthogonal and P^AP is the same as P~^AP. This leads to the unique diagonal 
representation of the matrix (with the eigenvalues as the diagonal values). Recall that the eigenvalues of a matrix M are 
the solutions of the equation det(A/ — XI) — 0. There exist stable numerical algorithms to achive this (see e.g. page 463 
of '2T) which take in the two real symmetric matrices A and B and returns the (non-singular) matrix P and the diagonal 
matrix P^AP. 
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Figure 2: The typical behaviour of C{a) = C{a) — Caim as we change a. The exitence of a unique 
solution to the Eq C (a) = is absolutely necessary for the algorithm to work. Since C (a) changes very 
quickly as we change a, we need to find the root of C{a) = to a high accuracy. 



with a chosen to be a small number (e.g. 0.01). This causes the algorithm to take very small "baby 
steps" towards the answer. Numerical experience has shown that as far as our problem is concerned, 
this is good enough. Of course, the actual value of a or Caim chosen is not important as long as the 
distance constraint gets satisfied. 

The problem in the sub space is thus simplified to finding the point Xa such that the function S 
is maximum subject to the constraint that C = Caim (and an additional constraint that the distance 
constraint must get satisfied). The techinique of Lagrange's undetermined multipler is useful here: we 
wish to find the point on the curve C = Caim where S is maximum, to find the desired point, we 
consider the set of points at which all the partial derivatives of the function 



Q = aS-C 

(for an undetermined a) vanish. For any a, such points are given by 

_ - Cg 

la + a 



(48) 



(49) 



So, for every value of a, find the value of Xa and then the function C: we are after that value of a 
which leads to C = Caim so we look for a solution of the equation C(a) = Caim- The function C(a) is 
a monotonically increasing function of a, see fig. ([2]). Since the function C{a) — Caim often happens to 
be a quickly changing function of a (especially while it is changing its sign), the solution for a needs 
to be found to a high tolerance level. 
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Figure 3: (i) The typical evolution of entropy and x'^ as the algorithm advances and (ii) the illustration 
of the fact that the angle 9 (in radians) drops very quickly as the algorithm proceeds. 

3.4 Stopping criterion 

In solving the constrained optimization problem, one fact which becomes important is the following: 
at the point at which the constrained optimization problem gets solved, the gradient vectors of the two 
functions become parallel. Thus, if we find the unit vector in the direction of gradient of entropy and 
in the direction of gradient of chi squared function, the dot product of these two unit vectors (defined 
using the entropy metric) must become negligible as we head towards the point at which the constrained 
optimization problem gets solved. The unit vectors in the directions of gradients are (with Lg and Lc 
defined previously) 



We thus expect the angle 



cos'' {-g Ut (51) 



to become too small (compared to a unit radian) as the algorithm proceeds (Fig([3])) 



4 Recovering Primordial Power Spectrum 

In this section, we shall (i) test the formalism presented in the previous section by trying to recover a 
featureless as well as feature-full sPPS from simulated noisy CMB data and (ii) apply the algorithm to 
actual WMAP 7 year binned TT angular power spectrum \\\ to recover the Primordial Power Spectrum. 
Thus, to begin with, we shall find out the radiation transfer function for the simplest set of assumptions, 
inject a featureless sPPS and get noise- free Cj'^ (which we shall refer to as theretical C^s). Next we 
shall add noise to these pure C^s. 

4.1 The radiative transport kernel 

First, we need to set the values of the various cosmological parameters and get the corresponding 
radiation transfer function. This can be done by making use of the codes such as CMBFAST [5j, 
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CAMB [6j or gTfast The results in this section are got from transfer function found using the 
code gTfast which itself is based on CMBFAST (version 4.0). It is important to notice that since 
in this work we shall only use the TT data, so, we only calculate the temperature radiation transfer 
function. To find out the transport kernel, we assume that the universe is spatially flat and dark enery 
is a cosmological constant (i.e. we have a spatially flat ACDM universe) and set the values of the 
cosmological parameters to their WMAP nine year values [1] (WMAP9 + bao + hO): the values of 
various parameters to be fed into the code gTfast are given in table [TJ We also assume that there are 
no tensor perturbations to the metric. We use Peebles recombination (rather than using RECFAST) 
and assume that the Primordial flutuations are completely adiabatic. Finally ,we shall not correct the 
transfer function for lensing of CMB, SZ effect or other effects that cause secondary ansiotropies of 
CMB. This shall give us the radiation transfer function from which we can easily evaluate the matrix 
Gik- For the case we are dealing with, the matrix Gik shall have dimensions 1500 x 6200. Finally, 
we would like to state that the results one obtains and conclusions that one draws should better not 
depend on the exact values of these parameters. 



Parameter 


value 


Imax 


1500 


ketamax 


3000 


nb 


0.0472 




0.2408 


nA 


0.712 




0.0 


Ho 


69.33 km/s/Mpc 


Tcmb 


2.72548 K 


Yue 


0.308 


Ny (massless) 


3.04 


Ny (massive) 


0.0 


T 


0.088 



Table 1: The values of various parameters for the run. 



4.2 Recovering test spectra 

We can now inject a test sPPS which is a power law with As = 2.427 x 10^^, = 0.971 (with 
ko = 0.002Mpc^^ and Tcmb = 2.72548 x 10^/iK) and get the corresponding theoretical C^s, and add 
noise. The noise we add is dominated by cosmic variance at low ^ (less than 600) values while for high 
I values, the noise is dominated by instrumental errors. Fig Q shows the result of using the algorithm 
described in the previous section to recover the sPPS in the present case. The following points are 
worth noting: 

1. To get the result shown in Fig @, we set the parameter A in Eq (|17p to be 5.4 x 10^. As was 
stated, the solution fi = A\s the location of global maximum of entropy in the / space. From 

Tcmb'-Q) ■P{k) = f{k) (52) 

it is clear that f = A = 54000 corresponds to P{k) being 2 x 10~®. Thus, this value of A 
corresponds to the situation in which P{k) = 2 x 10~® is the solution with the maximum value 
of entropy. 
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2. In an actual CMB experiment (such as WMAP) the amount of noise (instrumental as well as that 
due to cosmic variance) is not the same for all scales, which means that our data is not equally 
good for all values of k. Fig (jH) shows that at scales at which the noise is large (very low and 
very high i values which will correspond to very low and very high k values), the recovered f{k) 
tends to approach the value A, the recovery (at these scales) tends to be poor. Thus, at scales 
at which the noise is too large (or the kernel takes up negligible values), the recovery depends on 
what is the prior information we have about the solution. Thus the range of k values in which we 
can recover the PPS is too restricted. 

3. Even at scales at which the noise is smaller (and at which we hope to recover well), we can have 
wiggly artificial features in the recovered PPS (in the form of peaks and dips). In the recovered 
power spectrum there could exist three kinds of features: (i) those which are actually there in 
the injected PPS (which are not there in the present case), (ii) those which are not there in the 
PPS but got introduced by the algorithm itself (these shall change as we change A) and finally, 
(iii) those which are artefacts of the added noise (a particular realization of the noise shall have 
outliers, if we consider different realizations of the noise, we shall get different recoveries). 

4. The scales at which we typically introduce features in the sPPS are roughly 10~^ MPc~^ to 10~^ 
MPc"^ We would hke the recovery to be good at these scales. If we have data till very large value 
of £, and the noise at these large i values is very low compared to the noise at is corresponding to 
the above scales, the algorithm shall ignore the few data points with larger noise and try to only 
take the data at the other scales seriously. Thus, if we wish to recover better at these scales we 
must focus on recovering the PPS using only the data from the i values corresponding to these 
scales. Thus, having data till larger values of multipole moment with lesser noise may not help. 

1e-07 ^ ^ ^ ^ ^ ^ 




1e-10 ^ ^ ^ ^ ^ ^ 

1e-05 0.0001 0.001 0.01 0.1 1 

log k 

Figure 4: Recovery (green curve) of an injected featureless tilted red sPPS (the red line) using simulated 
unbinned CMB data. The artificially added noise is dominated by cosmic variance for small (upto 600) 
i values and by instrumental noise at larger i values. This result is obtained when the parameter A in 
Eq (dZD is set to the value 5.4 x 10^. 
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Figure 5: Recovery of an injected featureless tilted red sPPS using simulated binned data. The red 
straight line is the injected signal while the different curves correspond to different values of A. Since 
we do not know how to fix the solution corresponding to Global maximum of entropy, we can not know 
the value of A. 



1e-08 




Figure 6: The recovery at scales at which the data has lesser noise is not given by fi = A but is dependent 
on the specific realization of the noise added. The recovery in this case is done for A = 2252.0. 
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Figure 7: Recovery of spectrum with bumpy features. Here, A is set to 9000. The red and blue curves 
are the injected spectra while green and pink ones are the recoveries. Had we introduced a feature at 
scales where the recovery goes back to the global maximum of entropy (A) we could not have recovered 
it. 

In practise, the process of masking the sky causes the various C^s to get correlated. The simplest 
situation in which we can hope to recover the sPPS is the one in which the the data points corresponding 
to different i values are uncorrelated. This happens for the binned CMB dataset (which has data only 
for 45 i values). To make use of the binned data, we also work with a binned kernel which is defined 



where N is the number of i values in the bin. By using this averaged kernel and applying the algorithm 
to simulated binned data (with the added noise equal to the noise for WMAP 7 year binned data), 
we get the results shown in Fig ([5]) (this time we show the results for many A values). We again get 
an answer which at scales at which the noise is large, tends to the value of the default (i.e. ^4) while 
at scales at which the noise is relatively low, the recovery tends to fluctuate around the featuresless 
injected signal. For a fixed value of A, the recovery at scales at which the noise is relatively lower 
shall be different if we consider different realizations of the noise. This is illutrated in Fig ([6]): here 
the recovery shall be the same at scales with no data and shall be different at scales with data. The 
key question is whether we can recover features in the sPPS by this method. The fact that this can 
be done is illustrated in Fig ([71): we just introduce a bumpy feature between the scales 10"'^ MPc~^ to 
10-2 MPc"^ and vary its height and see that unless the height of the bump is too small, the algorithm 
can recover it. Of course if we introduce a feature at a scale at which the data is not good or at which 
the kernel takes up negligible values, the feature shall not be recovered. Moreover, it is not surprising 
that the recovery is much better if the feature is more prominent. 




•mill 



(53) 
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4.3 WMAP 7 year binned CMB data 



In this sub-section we apply the algorithm to actual CMB data. We use WMAP 7 year binned TT 
dataset and use it to recover the sPPS. The result is shown in figEJ The details of the recovery of course 
depend on the chosen value of the parameter A. In the present context, the value of A represents our 
a priori knowledge (without using any data) of how much we think should be the scalar fluctuation in 
the metric in the early universe. 



1e-07 




1e-10 ' ■ ■ ■ ■ ■ —-^ 

1e-05 0.0001 0.001 0.01 0.1 1 

log k 

Figure 8: The result of applying the algorithm to binned WMAP 7 year TT data. The solid black 
straight line corresponds to the Maximum Likelihood result that one gets if one assumes the sPPS to 
be a power law. The curves correspond to the following values of ^: A = 54000 (red), A = 15000 
(pink), A = 3000 (blue), A = 500 (green). 

It may appear that if the conclusion depends on such an a priori knowledge, we may not get anything 
worthwhile. But the following fact is worth noting: it is seen that at scales at which the noise is lesser, 
even though the recovered P{k) depends on the value of A chosen, this dependence is quite weak and 
quite predictable (as we increase A a lot or decrease it a lot, the recovery just "stretches" in the P{k) 
direction in the In P — In A; plane). An interesting exercise is this: if, without using the CMB data, we 
still knew that the amplitude of the scalar metric perturbations is (roughly) Ag, then what can this 
method of deconvolution tell us about the sPPS? Fig Q illustrates how the red tilt of the PPS can 
be detected in such a case. One can keep on decreasing the value of A and see what happens. In this 
context, the case of ^ = 1 is very interesting since this corresponds to using another familiar definition 
of entropy, the recovery for this case is illustrated in Fig ()10p . What is interesting is that if we choose A 
to be too small, we begin to get an IR cut-off not very different from the one reported in the literature 
previously (see [IT]), but, we also get an apparent UV cut-off. Moreover, such a small value of A 
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causes the artifical features to get stretched so much that we may not consider the reconstruction to 
be trustworthy in this case. 




Figure 9: The red hue is the WMAP ML power 
law sPPS. If we set A = Ag (thus, the blue line 
is the solution corresponding to global maxi- 
mum of entropy), we recover the green curve 
shown. The range of logP(A;) axis is from 
2.0 X 10-9 to 3.0 X 10-9. 
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Figure 10: Choosing A = 1.0 causes the fea- 
tures to get overly "stretched " and we find ap- 
parent IR and UV cut-offs in power. 



5 Summary and discussion 

In this work, we attempted to probe the amplitude and shape of scalar primordial power spectrum 
(sPPS) using the CMB data. We fixed the values of various cosmological parameters (apart from the 
ones specifying the sPPS itself) and formulated the problem as an inverse problem. To solve the inverse 
problem, we use the maximum entropy method which is a non linear regularization method. There 
exist many possible ways to employ the maximum entropy regularization, we use a particular definition 
of entropy and a particular algorithm to solve the corresponding constrained non-linear optimization 
problem in a very large dimensional parameter space. 

The way we have formulated the problem, there exists a parameter (which we called A) whose value 
decides the location of global maximum of entropy in the space of all P{k)s. In the absence of any 
data, the algorithm shall just send every initial guess to the global maximum of entropy. Even in the 
presence of data, the following is worth noting 

1. at scales where 

(a) we have noisy data (so, little or no information), or, 

(b) the kernel (to be inverted) takes up negligible values (again too large or too small k values), 

the P{k) recovered by MEM depends on the value of A chosen (as P{k) = A is the ME solution), 
while at the scales where the data is good, we recover something which has comparatively lesser 
dependence on what A we choose. 

2. at scales at which the data is good, the P{k) recovered by MEM is consistent with a power law 
primordial power spectrum (with any possibly small deviations which we can not say anything 
about at this stage). This can be seen by comparing fig ([5]) with figs ^ and d?]). While the 
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existence of any small deviations from power law behaviour can not be completely ruled out, this 
analysis reinforces our belief that any such possible deviations must be small. 

This is by no means the last word on the existence of features in sPPS, this is not even the last 
word on the use of MEM for this purpose. The implementation of our algorithm to this problem till now 
does not seem to give any reason to believe that there are any serious deviations from the power law. 
We would like to mention that this is not completely unexpected, even in the light of existing papers 
such as (17j because the error bars at scales at which the features were recovered in those works are 
very large: the maximum entropy method can not claim any features at scales where the error bars are 
so large Q This analysis shows that at scales at which the CMB data is trustworthy, the Primordial 
Power Spectrum of scalar metric perturbations is, to a very good approximation, a power law. 

In future, one can look at the follwing prosoects. We should be able to solve this problem of 
possible existence of features in sPPS without assuming the values of other cosmological parameters 
(i.e. without formulating this problem as a simple inversion problem). Even in the present formulation, 
there may be ways of combining results from different values of A to get a better recovery. One may 
wish to use the actual WMAP likelihood (or rather, the corresponding Xes) ^ ^ measure of misfit, but 
this is not easy in the way we have attempted to solve the problem (we need to know the ^-nd its 
first two derivatives). Also, we have lost a lot of information in the process of binning the kernel and 
working with the binned, uncorrelated data. We would like to use all that lost information. Similarly, 
we have only used the TT angular power spectrum of CMB, we would also like to use the polarization 
spectra to probe the sPPS. We may also need to post-process the recovered sPPS to get more useful 
information. Another interesting possibility worth exploring is the connection of Maximum Entropy 
deconvolution with other ways of deconvolution (e.g. Richard Lucy deconvolution) . 

A Testing the method 

The main text described the material necessary to employ the maximum entropy inversion in any 
circumstance. The following points need to be noted (these are just tried and tested facts about the 
algorithm, many of which are illustrated here for the case of a toy problem shown in Fig ijlip . whose 
solution is given in Fig ()12p ): 

• If we did not have any data available, the optimization problem would have involved maximizing 
entropy subject to no constraints. In such a scenario, the solution we should get must he fi = A 
as that is where the global maximum of entropy is. 

• If the value of A is such that the of the global maximum of entropy is smaller than Caim; 
then fi = Ais itself the desired solution since "the data are too noisy for any information to be 
extracted" (see the last paragraph of page 113 of [21j). 

• It is not a surprise at all that choosing too small value of A should lead to negative value for 
entropy (the fact that depending upon the choice of A, sometimes we could be at locations in the 
parameter space with negative value of S has no impact on the solution of the problem) , see Fig 

(USD. 

• In Eq (|48|) . a = oo corresponds to the unconstrained maximization of S irrespective of C. If we 
are too close to the global maximum of entropy (/j = A), the value of a required to solve the 

^This "rules out" (or atleast renders them untrustworthy) many models of inflation considered in the literature in 
recent times. 
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Figure 11: A toy problem to test the algorithm. The signal, which has a bump gets completely smoothed 
after the application of the kernel (chosen to be a Lorenzian profile), a known amount of random noise 
is then added giving the final data. Fig (|12p illustrates the recovery with two distinct initial guesses. 




1st initial guess 
2ncl initial guess 
1 St recovery 
2nd recovery 



Figure 12: An illustration of the fact that even completely different initial guesses lead to the same 
final recovery (done for the toy problem of Fig llip . Notice that the location of the recovered bump and 
its amplitude are not exactly right: the quality of the recovery depends on many factors including the 
form of the kernel matrix itself. 
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Surfaces of constant 




Parameter 1 



Figure 13: The red line in this Fig is fk = A line for various values of A. Changing the value of A shall 
change the recovery because the point on the =constant surface with maximum entropy changes in 
the process. 



constrained optimization problem in the subspace (for Q defined by Eq ()48p ) shall become too 
large. In this situation, it may be difficult to numerically find any solution for a. 

• As long as we do not stay too close to the global maximum of entropy (so that numerical problems 
such as those stated in the previous point above do not turn up), the choice of the initial guess 
for running the algorithm is immaterial. That is, all the initial guesses shall lead to the same 
answer (see Fig (fT2]l ). 

• All the above problems can be easily avoided if we just choose a value of A s.t. the of fi = A 
configuration is much higher than the of initial guess (which better be more than Caim)- 
Notice that this is not a requirement, just a trick. Also, this does not help us in finding any 
unique preferable value of A. 

• For many kernels the exact value of Caim chosen does not matter as far as the recovered / is 
concerned, as long as the final value of 9 becomes sufficiently small compared to a unit radian, all 
recoveries with different final are almost the same. The x^ of signal (for a given realization) 
shall just fluctuate around (roughly) n^, we have tested that if Caim is set equal to Cgignah the 
recovery does not change. This happens to be true e.g. for the case of CMB kernel, the case of 
our interest. 

• The exact details of the shape of the final recovered solution does depend upon the actual value 
of A chosen: the = Caim surface can be thought of as a closed ellipsoidal surface in the Ug 
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dimensional /-space while as we change A, we define the line fi = Aas being the location of global 
maximum of entropy for these different values of A. This will of course mean that as we change 
A, the place where the entropy is maximum on the = C'aim surface shall also change. Thus, as 
we continuously change A, we shall get a family of recoveries (see fig (fT3|) ). So, the details of the 
recovered answer depends on the chosen value of this free (or adjustable) parameter. But since 
the global maximum of entropy is at fi = A, the value of A represents the "background" (i.e. a 
priori) knowledge of how much the power in various bins is, without using any knowledge of data 
at all. 

• Whether the recovery is good or bad, depends on the details of the kernel. For the case of CMB 
kernel, we have tested that the recovery is often quite good. 
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